Post‐glacial colonization of the Fennoscandian coast by a plant parasitic insect with an unusual life history

Abstract Species that exhibit very peculiar ecological traits combined with limited dispersal ability pose a challenge to our understanding of ecological and evolutionary mechanisms. This is especially true when they have managed to spread over long distances, overcome physical barriers, and colonize large areas. Climate and landscape changes, trophic web relations, as well as life history all interact to shape migration routes and present‐day species distributions and their population genetic structures. Here we analyzed the post‐glacial colonization of northern Europe by the gall midge Contarinia vincetoxici, which is a monophagous parasite on the perennial herb White swallowwort (Vincetoxicum hirundinaria). This insect not only has a narrow feeding niche but also limited dispersal ability and an exceptionally long dormancy. Gall midge larvae (n = 329) were collected from 16 sites along its distribution range in Denmark, Sweden, and Finland. Using microsatellite loci and knowledge of the species and the regions' history, we investigated the role of landscape change, host plant distribution, insect population dynamics, and life history in shaping the population genetic structure of the insect. We devoted particular interest to the role of the insect's presumed poor dispersal capacity in combination with its exceptionally extended diapause. We found significant levels of local inbreeding (95% highest posterior density interval = 0.42–0.47), low‐level within‐population heterozygosity (mean H E = 0.45, range 0.20–0.61) with private alleles in all populations except two. We also found significant (p < .001) regional isolation‐by‐distance patterns, suggesting regularly recurring mainly short‐distance dispersal. According to approximate Bayesian computations, C. vincetoxici appears to have colonized the study area via wind‐aided flights from remote areas approximately 4600–700 years before present when the land has gradually risen above the sea level. Extremely long dormancy periods have allowed the species to “disperse in time”, thereby aiding population persistence despite generally low census population sizes.


| INTRODUC TI ON
The northern European flora and fauna is relatively young, having gradually colonized the region since the last glaciation (Hewitt, 1999). The establishment success of colonizing species depends on the presence of suitable living conditions, such as climate, food resources, and habitats. However, the colonization success is also affected by the specific biology and life history of the colonizers; their migratory habits, generation time, and reproductive potential (Mayer et al., 2015). Therefore, whereas some species colonized northern Europe long ago (soon after the ice sheet retreated), other species are later arrivals as they may be limited by poorer dispersal capacity or unsuitable living conditions. Many species are also dependent on prior arrival by other organisms. This is, for example, illustrated by insect species that are confined to a parasitic life inside a single host plant species where the biology and spread of the host plant set the limits for the focal species (Goczal et al., 2020;Mayer et al., 2015).
Here we investigate the post-glacial colonization process of a plant-parasitic insect Contarinia vincetoxici (Diptera, Cecidomyiidae) with an unusual life history. This monophagous gall midge is a tiny insect whose larva lives in flower galls of the perennial herb White Swallowwort, Vincetoxicum hirundinaria (Gentianales, Apocynaceae; Widenfalk et al., 2002). The insect's life is characterized by a brief period of active life and growth followed by a very long inactivity period (Solbreck & Widenfalk, 2012). The active period commences when the shortlived adults (1-2 days) emerge in early summer. They fly and mate soon after emergence, and females then oviposit in young flower buds that are transformed into galls where the larvae develop ( Figure 1a). Each gall contains about 15 larvae, and it appears that females can produce 1-2 galls each. The larvae feed for about 2 weeks before they leave the gall to spin a cocoon in the soil (Widenfalk et al., 2002). This is the beginning of the exceptionally long inactivity period spent in diapause (Danks, 1992) with a median duration of at least 6 years (range 2-13 or even more years; Solbreck & Widenfalk, 2012). The many age classes of diapausing gall midge larvae form a dormant population in the soil, which acts as an effective buffer against local extinctions (Solbreck & Widenfalk, 2012Widenfalk & Solbreck, 2005). Another important aspect of C. vincetoxici life history is its limited dispersal ability. Like other gall midge species (Sylvén, 1970) adults are poor fliers with very limited power to move upwind in search for host plants. Earlier studies of C. vincetoxici show that colonization of empty host plant patches is uncommon even at short distances from inhabited patches (Solbreck & Widenfalk, 2020). However, from a postglacial timescale perspective, we cannot rule out long distance and/or wind-aided flights in C. vincetoxici. Other gall midge species are known to enter the aeroplankton and to fly high above the ground (Johnson, 1969;Sylvén, 1970). To summarize, this gall midge is a slow and somewhat unpredictable colonizer but a sturdy survivor.
The long-lived host plant is found in rocky sun-exposed habitats (Figure 1b,c). It often has a distinct patchy distribution on the landscape scale (Solbreck, 2012). Its geographical range covers most of Europe and parts of northern Asia with a north-western distribution border along the south-eastern coast of Sweden, eastern Denmark, and the south-western corner of Finland (Donadille, 1965;Hultén & Fries, 1986). The gall midge has a more limited distribution range than its host plant and it is found in northern Italy, western Germany, Czechia, Slovakia, Austria, Switzerland, Belgium, Denmark, Sweden, and Finland (Skuhravá, 1986;Skuhravá et al., 2006;Skuhravá & Skuhravý, 2010;Widenfalk et al., 2002). Two distribution areas of the gall midge can be discerned. First, a central area around the Alpine countries and central Europe, and secondly, an arm extending along the western side of the Baltic Sea in Sweden and Denmark moving into the south-western tip of mainland Finland ( Figure 2).
Along the "Fennoscandian arm" the insect distribution today closely tracks the distribution limit of the host plant, and the insect is found in most places where the host plant grows.
However, host plant distribution has evidently changed much during the post-glacial period. Whereas the plant is believed to have colonized much of the south-eastern Swedish coast already 7000-8000 BP (Sterner, 1922), its further distribution (from Lake Mälaren region to SW Finland) was initially hindered. Much of the northernmost part of our study area was submerged and has gradually risen above the sea level during the last 7000 years (Harff & Meyer, 2011;Karlsson, 2007). This process has affected plant spread (Muola et al., 2021) with likely effects on the geographic structuring of gall midge populations.
Colonization processes can be detected by analyzing the variation of neutral molecular markers, e.g., microsatellite loci (simple sequence tandem repeats of nuclear DNA), which are routinely and effectively employed for such a purpose. Multilocus genotypes mirror the diversity of founders, dispersal pathways, and divergence times of populations (Austerlitz et al., 1997). Based on the gall midge's unique biology and the distribution pattern of its host plant, we aimed to (1) infer population structure and spatial differentiation of populations of the species and (2) reconstruct their colonization history using a coalescent-based approximate Bayesian computation (Cornuet et al., 2014). Finally, we aimed to (3) analyze indices of genetic variation that allow conclusions to be drawn about the history of the species and also reveal other biological phenomena (e.g., inbreeding) in the study system. We hypothesized that the insect has colonized the Baltic coast in Fennoscandia through a gradual expansion in a stepping stone-like manner (Kimura & Weiss, 1964). We expected that the gall midge has colonized new areas through very slow and successive local encounters of new suitable habitat patches that have emerged along the coast as the land rose above the sea level. However, we also expected signs of rare long-distance colonization events, creating pockets of unique genotype combinations. From the abovementioned colonization hypothesis, we derived two predictions regarding genetic variation. First, we predicted a correlation between genetic and geographic distances only at short distances with a breakdown of correlation at larger distances, but with pockets of genetic variants disrupting the general isolation-bydistance pattern. Second, we predicted a higher genetic variability in older introductions as a consequence of accumulation of novel mutations.

| Sampling of gall midge populations
A total of 329 galls were collected from 16 sites ranging from SW Finland to SE and S Sweden and E Denmark (Table 1; Figure 2). For the analysis, one larva was chosen from each gall. Populations were sampled in 2012 or 2015 except for three sites (Vad, Sol, Ror) which were sampled in 1991. This sampling covers the current distribution of the host plant and gall midge populations in Fennoscandia. When galls were few at a site, they were all collected. When they were more numerous, collecting efforts were spread across the entire host plant patch.

| DNA extraction and microsatellites
Developing larvae were dissected out from galls, and DNA was then extracted from one larva per gall using a Qiagen DNeasy Tissue Kit (Qiagen) according to manufacturer instructions. Microsatellite sequences of C. vincetoxici were isolated by ecogenics GmbH using F I G U R E 2 Location of sampled gall midge Contarinia vincetoxici populations and the known current distribution area of its host plant Vincetoxicum hirundinaria along the Baltic coast in Fennoscandia. Post-glacial shoreline of the Littorina Sea depicts the area where occurrence of both species was impossible in this historical period. the high-throughput genomic sequencing approach described by Abdelkrim et al. (2009). A total of 1 μg of genomic DNA was analyzed on a Roche 454 GS-FLX platform (Roche) using a 2/16th run and the GS FLX titanium reagents. The total 94,738 reads had an average length of 288 base pairs. Of these, 1761 contained a microsatellite insert with a tetra-or a trinucleotide of at least six repeat units or a dinucleotide of at least 10 repeat units. Altogether, 733 reads containing microsatellite inserts were suitable for primer design, 30 of which were randomly selected and tested for polymorphism. Finally, after excluding loci that did not amplify well or were difficult to interpret, the samples were genotyped using a set of 15 polymorphic loci amplified in three multiplex PCR (Appendix S1). The fragment analysis was performed at SciLifeLab Uppsala. Fragment lengths in electropherograms were determined by comparison of the detected true peak signals against the LIZ500 size standard (Applied Biosystems) using PeakScanner (Applied Biosystems) and manual editing. To ensure correct sizing of alleles, we re-amplified any loci that did not show clear peaks at least once. If some individuals were still missing a peak signal, the locus in question was treated as missing data.
To test for genotypic linkage disequilibria among loci, we used contingency tables and an unbiased estimate of the exact probability obtained by Markov Chain Monte Carlo simulations and 10,000 permutations using Arlequin 3.5 (Excoffier et al., 2005). We also checked microsatellite loci for Hardy-Weinberg equilibrium (HWE) and frequency of null alleles was estimated using methods of Chakraborty et al. (1994) and Brookfield et al. (1996) implemented in the "PopGenReport" 3.0.4 package (Adamack & Gruber, 2014) of the R 3.6.3 software (R Core Team, 2020). HWE was tested for each combination of sampling site and locus by the Chi-square test with the Yates continuity correction and with Bonferroni adjustment to prevent type I errors (α = 0.05/240). To avoid biased estimates of null alleles, we used the INEst 2.2 software (Chybicki & Burczyk, 2009) for simultaneous estimation of null allele frequencies and of the inbreeding coefficient to avoid biased estimates of null alleles.
We initially explored genetic relationship between populations using an unrooted neighbor-joining tree based on shared allele distances (maximum-likelihood with 1000 bootstrap replicates), calculated in the software Populations 1.2.33 (Langella, 1999) and constructed in the R package "ape" 5.3 (Paradis & Schliep, 2019). To test for isola-

| Population structure analysis
To find genetically homogeneous groups of individuals in our samples, we used an individual-based clustering method implemented in the software Structure 2.3.4 (Hubisz et al., 2009;Pritchard et al., 2000). We ran the admixture model with correlated allele frequencies without the prior population information and degree of admixture α = 1. For each value of K (range 1-16), we conducted 20 independent runs with uniform priors using a burn-in of 100,000 iterations followed by 100,000 Markov chain Monte Carlo iterations. The number of genetic clusters K in the data set was inferred in two steps. Firstly we used the ΔK method (Evanno et al., 2005), which finds the breakpoint in the slope of the likelihood distribution for different K values, using the Structure Harvester Web 0.6.94 (Earl & von Holdt, 2012). Thereafter, we identified the stable K solutions also through Q-matrix correlations (average maximum correlation coefficient and the rows-andcolumns method) implemented in the R package "CorrSieve" 1.6-9 (Campana et al., 2011), which helps to identify anomalous runs.
In our structure analysis, we adopted the hierarchical approach (Evanno et al., 2005). Thus, the uppermost hierarchical level of detected optimal K was also analyzed independently to detect potential substructure within inferred main clusters (K ranged from 1 to a total number of sampling sites included in the analyzed cluster). Outputs of the Structure analysis were visualized with the Clumpak program (Kopelman et al., 2015).
To compare variability among determined genetically homogenous clusters, we used four indices of individual heterozygosity, which are relevant for populations with high inbreeding (Aparicio et al., 2006): (1) proportion of heterozygous loci (PHt), (2) standardized heterozygosity based on the mean expected heterozygosity (Hs_exp), (3) internal relatedness (IR), and (4) homozygosity by locus (HL) calculated by the R function GENHET (Coulon, 2010). We determined differences among groups by the nonparametric Kruskal-Wallis ANOVA with a post hoc Dunn test for multiple comparisons in the R package "FSA" 0.8.30 (Ogle et al., 2020).

| Inferring colonization scenario
To estimate divergence times and population history of gall midge populations, we used a coalescent-based Approximate Bayesian Computation (ABC) algorithm in the program DIYABC 2.1.0 (Cornuet et al., 2014). We assigned the sampling sites to genetic clusters identified by the Structure analysis. The scenarios of the compared coalescent models, which we defined as the most likely, were based on the following three assumptions or hypotheses.
Our first assumption was that colonization of the area was gradual (stepping stone model; Kimura & Weiss, 1964) as the land was uncovered after the sea level retreated, thus, the species progressed from south to north (from cluster I to V; scenario 1 of Figure S1). The second assumption was that colonization began in the area where the highest diversity was found (cluster II or clusters II + III), indicating a long-term effect of mutations in a large ancestral population (scenarios 2-6). Lastly, we added a competing hypothesis where the population with the highest variation resulted from the admixture of different independently introduced lineages from an unknown source that gradually colonized the territory from different directions (scenarios 7-10). For the second and third assumptions, there are several alternative scenarios that differ in the times of divergence or admixture of different clusters. From these hypotheses, we selected a set of 10 hypothetical or possible evolutionary scenarios ( Figure S1), which were designed based on the results of the Structure analysis, and we simulated demographic parameters for inferred genetic clusters. Since the DIYABC software is strongly influenced by differences in sample size between populations, we selected 25 individuals for each cluster using random sub-sampling with respect to the smallest cluster. We explored the coales- was set in a range from 10 to 1500 which corresponds better to our field estimates. An initial size reduction (bottleneck) is expected when a new populaton is derived from an ancestral population, since this new population generally starts with few founders -a typical feature of the study system. Therefore, we set also an initial size reduction in different scenarios with the number of founders (N) in a range from 1 to 100. Priors for divergence times (in generations) were set as follows: t1 = 10-500, t2 = 100-500, t3 = 100-1000, t4 and t5 = 500-3000; while t5 > t4, t5 > t3, t5 > t2, t5 > t1, t4 > t3, t4 > t2, t4 > t1, t3 > t2, t3 > t1 and t2 > t1. An admixture rate

| Genetic variation
We successfully genotyped 329 larvae of C. vincetoxici (4-34 individuals per population) using 15 microsatellite loci. In total, 3.2% of data was missing. However, this was particularly due to the three populations with the smallest sample size (Vad, Ror, Sol) where on average 5 out of 15 loci were not amplified. Significant linkage disequilibria were found for almost all loci combinations (Table S1).
However, there was no consistency among populations, and no locus stood out more than another, thus we interpret the pattern as being a result of a biological phenomenon rather than a pattern created by physical linkage among loci. Significant departures (p < .001) from HWE were revealed in half of the populations at 1-6 loci (Table S2).
Although there was also evidence of null alleles at all loci (Table S3) (Table S3). However, because deviations from HWE between loci in different populations were completely random (Table S2), we left the dataset as is for further analyses and did not compensate for null alleles.    Figure S3). Since the maximum average Q-matrix correlation also yielded a stable solution for K = 3 (R = .99, p < .05), we therefore assumed two genetic clusters around the lake Mälaren in central Sweden (clusters III and IV) and one cluster of populations from southern Finland (cluster V; Figure 4c). Finally, we found five genetic clusters, derived from two main clusters, which followed a latitudinal pattern along the Baltic Sea coast (Figure 4d). Three populations located in SW Sweden (Hallands Väderö) and E Denmark (Sölager and Rörvig) did deviate from this pattern, but their relative genetic distance from other samples (Figure 4e) was likely biased by small sample sizes and lowered genotyping success. The pattern was also stable when the structure analysis was rerun by a reduced number of 10 loci where the loci that did not amplify well were omitted (data not shown). In general, there was no difference in the level of inbreeding between genetic clusters ( Figure S4), but individual heterozygosity differed for all indices (PHt, χ 2 = 67.7, df = 4, p < .001; Hs_exp, χ 2 = 67.7, df = 4, p < .001; IR, χ 2 = 59.8, df = 4, p < .001; HL, χ 2 = 73.8, df = 4, p < .001). This analysis suggested the highest heterozygosity in cluster II ( Figure 5).

| Colonization history
From the 10 tested possible evolutionary scenarios ( Figure S1), scenario 9 (Figure 6a) emerged as the most likely scenario of the Baltic coast colonization by C. vincetoxici as deduced from the ABC computations. Posterior probability of this scenario was 0.67 after logistic regression on the 1% simulated data most similar to the observed data; however, probability of scenario 8 ( Figure S1) was also relatively high (Figure 6b). The probability with which scenario 8 was rejected although it was the true scenario was 0.32 (type I error; 500 test data sets simulated under selected scenario), and the probability of supporting this scenario even when it was not the correct one was 0.39 (type II error  (Figure 6c).

| DISCUSS ION
The distribution of an organism is the result of intertwined processes operating on a variety of spatial and temporal scales. Our study links the current distribution of the insect to land surface changes, host plant dispersal, and insect life history to draw conclusions on the insect's post-glacial colonization process. This multiangle approach is essential for understanding colonization routes and present-day genetic structure of the gall midge C. vincetoxici in Fennoscandia. We found relatively low genetic diversity and a high level of local inbreeding, most likely due to the insect's poor dispersal capacity, as evidenced also by a significant isolation-bydistance pattern. The presented results also suggest that the species colonized the study area via wind-assisted flights from remote areas as the land has gradually risen above the level of post-glacial Littorina sea. F I G U R E 6 (a) Schematic representation of the best-supported colonization scenario for Contarinia vincetoxici (scenario 9; competing scenarios see in Figure S1) inferred by the Approximate Bayesian Computation analysis (Ne, effective population size; N, initial size reduction; NA, unsampled founder population; t, time of divergence or admixture; db, demographic bottleneck; ra, admixture rate; for simulated parameters and estimated values see Section 2 and 3, respectively). (b) The comparative posterior probabilities of each scenario fitting the real data using logistic regression (please note that other scenarios overlap at the bottom of the plot). (c) Reconstruction of colonization history of the gall-midge along the Baltic coast in Fennoscandia founded by unknown source populations. Values show approximate years before present (crosses, samples not analyzed; see Section 2).

| Dispersal capacity
The most striking result from our study is the apparent isolationby-distance (IBD) pattern across the whole study area. Despite the insect being tiny (wing length ~1.5 mm; Widenfalk et al., 2002), it is apparently capable of dispersal that counteracts genetic drift even at distances up to 800 km. Further, we could not see any sign of the expected combination of very short dispersal with rare longdistance expansions. Rather, it appears that colonization of the northern distribution area has relied on a gradual and relatively strong dispersal capacity. A hierarchical population structure can easily be mistaken for a strong pattern of IBD, but the reverse is also true (Meirnmans, 2012). However, knowledge of colonization history can disentangle the problem. The uppermost hierarchical level in our approach inferred only two main clusters. This is in accordance with two independent introductions that appears to have occurred approximately 300 km apart, as detected by ABC analysis (~56°N vs. 58°N). Since further genetic clusters were inferred only in the substructures at shorter geographical distances we hypothesize that the presented genetic structure is due to colonization events rather than a consequence of IBD (Perez et al., 2018).

So how does this insect traverse the landscape? Observations on
C. vincetoxici and of gall midges in general agree with the picture of a mixed dispersal strategy. A slow short-distance dispersal seems to be the dominating colonization process when looked at in the moderate time perspective of the population ecologist. Expansions to previously unoccupied patches are uncommon and are usually confined to rather short distances from previously colonized patches (Solbreck & Widenfalk, 2020). However, occasional long-distance dispersal events in gall midge species seem to appear via aerial plankton (Johnson, 1969;Sylvén, 1970). Even though these long-distance movements may be too rare to be observed directly, they are obviously important in the long-time perspective, for example, for an initial colonization of the new area from remote source population.
In our case, this could be the transport of founder individuals from the mainland to Fennoscandia. We know that long distance aerial dispersal occurs in many small organisms. For instance, recently icefree lands after retreat of glaciers are colonized by aeroplankton of nematodes, mites, spiders, springtails, tardigrades, thrips, and other small insects (Ficetola et al., 2021) enabling gene flow between populations in very remote sites (Ptatscheck et al., 2018).  (Kalske et al., 2014;Tullberg et al., 2000) and it is not fed upon by any vertebrates. Drift with plant material on the sea surface (Mende et al., 2010) is slow and hazardous, and it is highly unlikely to provide the synchrony between insect and plant life cycles necessary for successful colonization. Finally, unintentional long transport of species by humans, especially plants with galls containing larva, is also very unlikely, as larvae readily leave disturbed galls.

| Colonization events
Our data indicate that long-distance dispersal should have occurred in this species. This is interesting from an evolutionary perspective as it is probably the way in which C. vintetoxici first arrived to the south of the Scandinavian Peninsula in 4.6 ka BP (cluster I). Revealed molecular dating also rejects the possibility that dispersal occurred through the last land bridge which connected south tip of the peninsula with the mainland yet 10.3 ka BP before formation of the Littorina/Baltic Sea (Björck, 1995

| Life history and population genetics
The prolonged larval diapause (of variable length) is an important aspect of the gall midge's life cycle, with considerable implications for its occurrence and population genetics. As our data show, the limited dispersal rate of the gall midge is not a hindrance to high occupancy in some sites. It appears that the high survival through long dormancy periods acts as a strong buffer against local extinction even in very small gall midge populations (Solbreck & Widenfalk, 2012. Thus, field observations suggest that even though colonization rates are low, once a local habitat patch is colonized it is highly likely to remain so. Population dispersal and establishment is dependent on the relative rates of colonization and of local survival and indicates that this so-called poor colonizer becomes an efficient survivor in the long run as "migration in time" compensates for limited migration in space.
Another aspect of the "low colonization-high survival" syndrome is the usually small size of host plant patches. This means that most local gall midge populations are also small, and small populations are generally subjected to high extinction risks.
However, only a small fraction of a local population emerges from diapause to reproduce each year thus being exposed to extinction hazards (Solbreck & Widenfalk, 2012, whereas the larvae remaining in diapause in the soil are sheltered. Any genetic variation is likely to be retained longer in this gene bank of diapausing larvae, which was confirmed in our study. As predicted, the local genetic variation was higher in older populations than in recent colonizations. However, the highest variation and the lowest inbreeding were found in the admixture (cluster II) of two different introductions to Fennoscandia, which coincided with subsequent colonization events towards north. In addition, extreme inbreed-

ACK N OWLED G M ENTS
We thank the following for financial support: Carl Trygger's Agency VEGA (2/0107/21 to PK). We are grateful to Magdalena M.
Buś for help with genotyping of samples, and we also thank Johan Wallén and Carol Högfeldt for assistance in the lab and Craig Primmer for discussion about preliminary data. We are indebted to three anonymous reviewers for valuable comments and suggestions that helped to improve a previous version of the manuscript.

CO N FLI C T O F I NTE R E S T S TATE M E NT
None declared.

DATA AVA I L A B I L I T Y S TAT E M E N T
Microsatellite genotypes are available from the Dryad Digital Repository: https://doi.org/10.5061/dryad.63xsj 3v70.